2-D Wavelet Transform Enhancement on General- Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation1
نویسندگان
چکیده
This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts. As experimental platforms we have employed a Pentium-III (P-III) and a Pentium-4 (P-4) microprocessor. However, our SIMD-oriented tuning has been exclusively performed at source code level. Basically, we have reordered some loops and introduced some modifications that allow automatic vectorization. Taking into account the abstraction level at which the optimizations are carried out, the speedups obtained on the investigated platforms are quite satisfactory, even though further improvement can be obtained by dropping the level of abstraction (compiler intrinsics or assembly code).
منابع مشابه
Vectorization of the 2D Wavelet Lifting Transform Using SIMD Extensions
This paper addresses the vectorization of the lifting-based wavelet transform on general-purpose microprocessors in the context of JPEG2000. Since SIMD exploitation strongly depends on an efficient memory hierarchy usage, this research is based on previous work about cacheconscious DWT implementations [1,2,3]. The experimental platform on which we have chosen to study the benefits of the SIMD e...
متن کاملReducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions
The video compression algorithms based on the 3D wavelet transform obtain excellent compression rates at the expense of huge memory requirements, that drastically affects the execution time of such applications. Its objective is to allow the real-time video compression based on the 3D fast wavelet transform. We show the hardware and software interaction for this multimedia application on a gene...
متن کاملWavelet Transform for Large Scale Image Processing on Modern Microprocessors
In this paper we discuss several issues relevant to the vectorization of a 2-D Discrete Wavelet Transform on current microprocessors. Our research is based on previous studies about the efficient exploitation of the memory hierarchy, due to its tremendous impact on performance. We have extended this work with a more detailed analysis based on hardware performance counters and a study of vectori...
متن کاملLocality-Improved FFT Implementation on a Graphics Processor
The growing computational power of modern graphics processing units is making them very suitable for general purpose computing. These commodity processors operate generally as parallel SIMD platforms and, among other factors, the effectiveness of the codes is subject to a right exploitation of the underlying memory hierarchy. This paper deals with the implementation of the Fast Fourier Transfor...
متن کاملParallel Wavelet Transform for Large Scale Image Processing
In this paper we discuss several issues relevant to the parallel implementation of a 2-D Discrete Wavelet Transform (DWT) on general purpose multiprocessors. Our interest in this transform is motivated by its usage in an image fusion application which has to manage large image sizes, making parallel computing highly advisable. We have also paid much attention to memory hierarchy exploitation, s...
متن کامل